ancient document
Benchmarking Vision-Language Models on Chinese Ancient Documents: From OCR to Knowledge Reasoning
Yu, Haiyang, Wu, Yuchuan, Shi, Fan, Liao, Lei, Lu, Jinghui, Ge, Xiaodong, Wang, Han, Zhuo, Minghan, Wu, Xuecheng, Fei, Xiang, Feng, Hao, Tang, Guozhi, Wang, An-Lan, Zhu, Hanshen, He, Yangfan, Liang, Quanhuan, Meng, Liyuan, Feng, Chao, Huang, Can, Tang, Jingqun, Li, Bin
Chinese ancient documents, invaluable carriers of millennia of Chinese history and culture, hold rich knowledge across diverse fields but face challenges in digitization and understanding--traditional methods only scan images, while current Vision-Language Models (VLMs) struggle with their visual/linguistic complexity. Existing document benchmarks focus on English printed texts or simplified Chinese, leaving a gap for evaluating VLMs on ancient Chinese documents. To address this, we present AncientDoc, the first benchmark for Chinese ancient documents, designed to assess VLMs from OCR to knowledge reasoning. AncientDoc includes five tasks (page-level OCR, vernacular translation, reasoning-based QA, knowledge-based QA, linguistic variant QA) and covers 14 document types, over 100 books, and about 3,000 pages. Based on AncientDoc, we evaluate mainstream VLMs using multiple metrics, supplemented by a human-aligned large language model for scoring. The benchmark are available at https://bytedance.github.io/AncientDoc.
Punctuation restoration Model and Spacing Model for Korean Ancient Document
Jang, Taehong, Ahn, Joonmo, Kim, Sojung Lucia
In Korean ancient documents, there is no spacing or punctuation, and they are written in classical Chinese characters. This makes it challenging for modern individuals and translation models to accurately interpret and translate them. While China has models predicting punctuation and spacing, applying them directly to Korean texts is problematic due to data differences. Therefore, we developed the first models which predict punctuation and spacing for Korean historical texts and evaluated their performance. Our punctuation restoration model achieved an F1 score of 0.84, and Spacing model achieved a score of 0.96. It has the advantage of enabling inference on low-performance GPUs with less VRAM while maintaining quite high accuracy.
DeepScribe AI Can Help Translate Ancient Tablets
Researchers from the University of Chicago's Oriental Institute and the Department of Computer Science have collaborated to design an AI that can help decode tablets from ancient civilizations. According to Phys.org, the AI is called DeepScribe and was trained on over 6,000 annotated images pulled from the Persepolis Fortification Archive, when it is complete the AI model will be able to interpret unanalyzed tablets, making studying ancient documents easier. Experts who study ancient documents, like the researchers who are studying the documents created during the Achaemenid Empire in Persia, need to translate ancient documents by hand, a long process that is prone to errors. Researchers have been using computers to assist in interpreting ancient documents since the 1990s, but the computer programs that were used were of limited help. The complex cuneiform characters, as well as the three-dimensional shape of the tablets, put a cap on how useful the computer programs could be.